transformer encoder
0e0157ce5ea15831072be4744cbd5334-Supplemental-Conference.pdf
A.1 Dataset Details & Evaluation Metrics As stated earlier, the main application of Extreme Multi-label Text Classification is in e-commerce - product recommendation and dynamic search advertisement - and in document tagging, where the objective of an algorithm is to correctly recommend/advertise among the top-k slots. Thus, for evaluation of the methods, we use precision at k (denoted by P@k), and its propensity scored variant (denoted by PSP@k) [17]. These are standard and widely used metrics by the XMC community [4]. Since P@k treats all the labels equally, it doesn't reveal the performance of the model on tail labels. However, because of the long-tailed distribution in XMC datasets, one of the main challenges is to predict tail labels correctly, which may be more valuable and informative compared to head classes.
The Power of Hard Attention Transformers on Data Sequences: A formal language theoretic perspective
Formal language theory has recently been successfully employed to unravel the power of transformer encoders. This setting is primarily applicable in Natural Language Processing (NLP), as a token embedding function (where a bounded number of tokens is admitted) is first applied before feeding the input to the transformer.
Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin 1 Muchao Y e
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks.